Skip to content

[fix](fe) Handle generated columns in delete partial update#64884

Open
bobhan1 wants to merge 1 commit into
apache:masterfrom
bobhan1:fix/cir-20559-delete-generated-column
Open

[fix](fe) Handle generated columns in delete partial update#64884
bobhan1 wants to merge 1 commit into
apache:masterfrom
bobhan1:fix/cir-20559-delete-generated-column

Conversation

@bobhan1

@bobhan1 bobhan1 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Original Problem: DELETE failed during analysis on a unique merge-on-write table that contains a generated column when light delete is disabled. The original reported table had a generated VARIANT column:

new_col variant AS (receive_address_detail) NULL

and the DELETE failed with:

errCode = 2, detailMessage = cannot find column from target table /receive_address_detail

A minimized reproduction is:

create table test_gen_col_mow_delete_partial_update (
    a int,
    b int,
    c int as (b + 1),
    d int
)
unique key(a)
distributed by hash(a) buckets 1
properties (
    "enable_unique_key_merge_on_write" = "true",
    "enable_mow_light_delete" = "false",
    "replication_num" = "1"
);

After inserting rows, running:

delete from test_gen_col_mow_delete_partial_update where a = 1;

failed with:

java.sql.SQLException: errCode = 2, detailMessage = cannot find column from target table [b]

Problem Summary: DELETE on a unique merge-on-write table with light delete disabled is rewritten to a partial update load that writes key columns and the delete sign. BindSink used to auto-add every generated column and then recompute omitted generated columns, which required ordinary value columns that were not part of the DELETE output and caused analysis to fail.

This change treats DELETE partial updates specially: generated columns that are not emitted by the child plan are skipped, and generated columns that are emitted by the child plan use that child output directly. Normal partial update generated-column dependency checks are unchanged.

The tests cover generated value columns, generated key columns, a generated value-column table with an omitted NOT NULL value column that has no default value, and the original VARIANT generated-column shape based on receive_address_detail.

Release note

Fix DELETE failure on unique merge-on-write tables with generated columns when light delete is disabled.

Check List (For Author)

  • Test: Regression test / Unit Test
    • ./build.sh --fe -j100
    • ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.DeleteFromUsingCommandTest
    • ./run-regression-test.sh --run -d regression-test/suites/ddl_p0/test_create_table_generated_column -s test_generated_column_delete -forceGenOut
    • ./run-regression-test.sh --run -d regression-test/suites/ddl_p0/test_create_table_generated_column -s test_generated_column_delete
    • ./run-regression-test.sh --run -d regression-test/suites/ddl_p0/test_create_table_generated_column -s test_partial_update_generated_column
  • Behavior changed: Yes (DELETE now succeeds for unique merge-on-write tables with generated columns when light delete is disabled.)
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 force-pushed the fix/cir-20559-delete-generated-column branch from a37d300 to 6ff668a Compare June 26, 2026 07:54
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: DELETE on a unique merge-on-write table with light delete disabled is rewritten to a partial update load that writes key columns and the delete sign. BindSink used to auto-add every generated column and then recompute omitted generated columns, which required ordinary value columns that were not part of the DELETE output and caused analysis to fail. This change treats DELETE partial updates specially: it skips generated columns that are not emitted by the child plan and uses child output directly for generated columns that are emitted, while leaving normal partial update generated-column dependency checks unchanged.

### Release note

Fix DELETE failure on unique merge-on-write tables with generated columns when light delete is disabled.

### Check List (For Author)

- Test: Regression test / Unit Test
    - `./build.sh --fe -j100`
    - `./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.DeleteFromUsingCommandTest`
    - `./run-regression-test.sh --run -d regression-test/suites/ddl_p0/test_create_table_generated_column -s test_generated_column_delete -forceGenOut`
    - `./run-regression-test.sh --run -d regression-test/suites/ddl_p0/test_create_table_generated_column -s test_generated_column_delete`
    - `./run-regression-test.sh --run -d regression-test/suites/ddl_p0/test_create_table_generated_column -s test_partial_update_generated_column`
- Behavior changed: Yes (DELETE now succeeds for unique merge-on-write tables with generated columns when light delete is disabled.)
- Does this need documentation: No
@bobhan1 bobhan1 force-pushed the fix/cir-20559-delete-generated-column branch from 6ff668a to b102fc9 Compare June 26, 2026 08:07
@bobhan1 bobhan1 marked this pull request as ready for review June 26, 2026 08:08
@bobhan1

bobhan1 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@bobhan1

bobhan1 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: I did not find any blocking issue in this PR. The change is narrowly scoped to DELETE partial-update binding for generated columns, and the added FE/regression coverage exercises generated value columns, generated key columns, NOT NULL value columns, and a VARIANT generated-column table with enable_mow_light_delete=false.

Critical checkpoint conclusions:

  • Goal/test coverage: The PR fixes MoW DELETE partial-update analysis for generated columns by not forcing generated value columns into the sink column list when the DELETE child does not produce them, while preserving generated key columns that are produced by the DELETE projection. Added FE analyzer tests and regression expected output cover the main cases.
  • Scope/focus: The implementation is limited to OLAP sink binding and passes false through external sink call sites, so connector/Hive/Iceberg/MaxCompute insert behavior is unchanged.
  • Concurrency/lifecycle: No new shared state, locking, threads, or lifecycle-managed objects are introduced.
  • Configuration/compatibility: No new config, protocol field, storage format, or mixed-version behavior is introduced. Existing enable_mow_light_delete=false behavior is covered by tests.
  • Parallel paths: Simple DELETE and DELETE USING both reach the delete-as-insert OLAP sink path for this mode. Cluster-key and sync-MV cases disable partial update before this shortcut applies.
  • Data correctness: Generated key columns remain in the child output because DeleteFromCommand.completeQueryPlan projects key columns; generated value columns without child output are skipped so they are not incorrectly recomputed from missing base value columns or marked in the partial-update input set.
  • Tests/results: The regression outputs are ordered and match schema order. Tables are dropped before use and hardcoded names are used.
  • Observability/performance: No new logging/metrics are needed for this planner binding fix; the change avoids unnecessary generated expression analysis in the DELETE partial-update path.

Subagent conclusions:

  • optimizer-rewrite: no candidate findings; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS for the final empty comment set.
  • tests-session-config: no candidate findings; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS for the final empty comment set.
  • No candidates were accepted, dismissed as duplicates, or submitted as inline comments.

Validation performed:

  • Reviewed the GitHub changed-file list and all four changed files plus the relevant DELETE, sink binding, translator, generated-column, and BE partial-update validation paths.
  • Verified no existing inline review comments were present.
  • Ran git diff --check on the explicit base/head changed-file range; it passed.

Validation not run:

  • FE unit tests and regression tests were not executed locally because this checkout is not worktree-initialized and thirdparty/installed/bin/protoc is missing.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29022 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b102fc91bc14b53994b6b54a14936ff4fd1c27c8, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17674	4080	3982	3982
q2	2005	317	187	187
q3	10374	1403	865	865
q4	4682	472	334	334
q5	7506	851	568	568
q6	179	170	133	133
q7	775	839	631	631
q8	9434	1689	1536	1536
q9	5503	4442	4481	4442
q10	6782	1781	1541	1541
q11	450	273	244	244
q12	630	418	296	296
q13	18069	3392	2727	2727
q14	269	258	245	245
q15	q16	793	786	714	714
q17	983	1038	933	933
q18	7163	5805	5591	5591
q19	1250	1340	1119	1119
q20	503	404	261	261
q21	5834	2626	2373	2373
q22	440	363	300	300
Total cold run time: 101298 ms
Total hot run time: 29022 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4317	4243	4255	4243
q2	321	343	216	216
q3	4608	4947	4458	4458
q4	2083	2172	1388	1388
q5	4440	4313	4286	4286
q6	239	181	129	129
q7	1699	1639	1762	1639
q8	2745	2210	2174	2174
q9	8331	8401	8069	8069
q10	4808	4713	4235	4235
q11	566	420	384	384
q12	783	751	570	570
q13	3244	3596	2946	2946
q14	299	303	292	292
q15	q16	723	762	667	667
q17	1363	1343	1316	1316
q18	7960	7233	7283	7233
q19	1148	1124	1116	1116
q20	2230	2277	1962	1962
q21	5282	4564	4465	4465
q22	515	492	416	416
Total cold run time: 57704 ms
Total hot run time: 52204 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171671 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b102fc91bc14b53994b6b54a14936ff4fd1c27c8, data reload: false

query5	4318	632	485	485
query6	439	192	199	192
query7	4817	560	279	279
query8	336	179	165	165
query9	8742	4116	4117	4116
query10	460	317	259	259
query11	5904	2325	2153	2153
query12	156	103	104	103
query13	1251	578	430	430
query14	6275	5362	4993	4993
query14_1	4345	4297	4340	4297
query15	217	214	186	186
query16	1016	465	456	456
query17	959	735	596	596
query18	2445	487	359	359
query19	212	192	156	156
query20	122	106	108	106
query21	221	143	123	123
query22	13662	13655	13451	13451
query23	17448	16594	16174	16174
query23_1	16344	16265	16345	16265
query24	7651	1746	1279	1279
query24_1	1324	1296	1303	1296
query25	537	423	366	366
query26	1293	315	165	165
query27	2750	538	343	343
query28	4435	2018	2009	2009
query29	1075	616	486	486
query30	308	228	200	200
query31	1120	1086	958	958
query32	119	60	65	60
query33	512	318	240	240
query34	1189	1135	663	663
query35	772	777	670	670
query36	1388	1403	1239	1239
query37	155	100	90	90
query38	1883	1722	1661	1661
query39	926	931	898	898
query39_1	872	878	875	875
query40	219	121	100	100
query41	65	63	62	62
query42	89	86	90	86
query43	325	321	287	287
query44	1462	782	783	782
query45	211	187	182	182
query46	1068	1199	732	732
query47	2340	2351	2254	2254
query48	384	396	303	303
query49	587	428	320	320
query50	1022	372	260	260
query51	4394	4447	4346	4346
query52	80	79	70	70
query53	246	266	193	193
query54	264	217	219	217
query55	74	73	64	64
query56	247	213	211	211
query57	1412	1397	1284	1284
query58	249	212	204	204
query59	1560	1660	1441	1441
query60	283	248	227	227
query61	187	141	147	141
query62	698	635	577	577
query63	234	190	194	190
query64	2546	764	606	606
query65	4850	4799	4810	4799
query66	1816	468	352	352
query67	28924	28853	28682	28682
query68	3203	1468	963	963
query69	412	307	258	258
query70	1086	985	950	950
query71	296	227	210	210
query72	2850	2625	2396	2396
query73	886	791	451	451
query74	5124	4958	4798	4798
query75	2591	2566	2154	2154
query76	2329	1186	759	759
query77	351	395	305	305
query78	12539	12424	11819	11819
query79	1252	1131	738	738
query80	527	469	391	391
query81	456	277	243	243
query82	239	155	119	119
query83	363	278	253	253
query84	310	146	118	118
query85	854	521	403	403
query86	369	305	283	283
query87	1860	1819	1764	1764
query88	3732	2828	2771	2771
query89	419	387	328	328
query90	1955	182	178	178
query91	168	162	130	130
query92	63	60	53	53
query93	1435	1496	987	987
query94	547	353	311	311
query95	654	483	346	346
query96	1019	830	338	338
query97	2688	2687	2562	2562
query98	213	201	199	199
query99	1180	1152	1036	1036
Total cold run time: 256128 ms
Total hot run time: 171671 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b102fc91bc14b53994b6b54a14936ff4fd1c27c8, data reload: false

query1	0.00	0.00	0.01
query2	0.09	0.05	0.05
query3	0.25	0.13	0.14
query4	1.62	0.14	0.15
query5	0.24	0.24	0.24
query6	1.24	1.10	1.06
query7	0.03	0.01	0.01
query8	0.05	0.03	0.04
query9	0.40	0.30	0.31
query10	0.54	0.55	0.54
query11	0.19	0.14	0.15
query12	0.19	0.15	0.14
query13	0.47	0.47	0.48
query14	1.01	1.00	0.99
query15	0.60	0.58	0.59
query16	0.33	0.32	0.31
query17	1.10	1.12	1.07
query18	0.24	0.22	0.22
query19	2.09	1.87	1.94
query20	0.02	0.02	0.02
query21	15.44	0.20	0.13
query22	4.88	0.06	0.05
query23	16.16	0.33	0.12
query24	3.01	0.42	0.31
query25	0.10	0.06	0.04
query26	0.73	0.22	0.15
query27	0.04	0.04	0.04
query28	3.45	0.86	0.51
query29	12.48	4.27	3.46
query30	0.27	0.15	0.16
query31	2.77	0.58	0.31
query32	3.22	0.61	0.50
query33	3.13	3.18	3.21
query34	15.67	4.26	3.53
query35	3.55	3.53	3.53
query36	0.55	0.43	0.43
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.17	0.16	0.14
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 96.67 s
Total hot run time: 25.11 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 62.50% (10/16) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 6.10% (13/213) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants